Sample Complexity of Testing the Manifold Hypothesis
نویسندگان
چکیده
The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is independent of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume and curvature is unavoidable. Whether the known lower bound of O( k 2 + log 1 δ 2 ) for the sample complexity of Empirical Risk minimization on k−means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 [3]. Here is the desired bound on the error and δ is a bound on the probability of failure. We improve the best currently known upper bound [14] of O( 2 2 + log 1 δ 2 ) to O ( k 2 ( min ( k, log k 2 )) + log 1 δ 2 ) . Based on these results, we devise a simple algorithm for k−means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension.
منابع مشابه
Acceptance sampling for attributes via hypothesis testing and the hypergeometric distribution
This paper questions some aspects of attribute acceptance sampling in light of the original concepts of hypothesis testing from Neyman and Pearson (NP). Attribute acceptance sampling in industry, as developed by Dodge and Romig (DR), generally follows the international standards of ISO 2859, and similarly the Brazilian standards NBR 5425 to NBR 5427 and the United States Standards ANSI/ASQC Z1....
متن کاملآموزش منیفلد با استفاده از تشکیل گراف منیفلدِ مبتنی بر بازنمایی تنک
In this paper, a sparse representation based manifold learning method is proposed. The construction of the graph manifold in high dimensional space is the most important step of the manifold learning methods that is divided into local and gobal groups. The proposed graph manifold extracts local and global features, simultanstly. After construction the sparse representation based graph manifold,...
متن کاملTesting the weak form of efficient market hypothesis in carbon efficient stock indices along with their benchmark indices in select countries
This paper presents the results of tests on the weak form of Efficient Market Hypothesis applied to carbon efficient stock market indices of India, the United States of America (USA), Japan, and Brazil and their corresponding market indices which are used as their benchmark indices. In this study, Kolmogrov-Smirnov and Shapiro-Wilk tests are used to test the normality of data. Run test and auto...
متن کاملResidential Model and the Role It Plays on Human Relations in Residential Complexes
This paper deals with the concept of “residential models” and the role it plays on human relations in residential complexes. It intends to; review the concept of residence; understand how a residential model takes form; the influence of a residential model on future human interactions. To this end, having reviewed comments and viewpoints, focused on “what is residence? which factors may create ...
متن کاملExamining the Relationship between Social Responsibility and Disclosure of Remuneration Paid to Board of Directors
companies' continuity, because all companies have some relations with the society;Therefore, the society provides long-term survival of the company.In this way, companies in addition to economic responsibility, must take responsibilityof social issues. Therefore, with respect to corporate social responsibility andits revelations, the current paper examines the relationship between social respon...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010